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A graph- theoretic paradigm is used to generalize the 
common measures of categorical clustering in free recall based on the 
number" of observed repetitions. Two graphs are defined: a graph G 
tha''t characterizes the a priori structure of the item set defined by 
a researcher*, and a ^raph P tAat characterizes a subject's .protocol. 
Two indices' of clustering denoted by gamma and omega, are obtained 
by evaluating the sum of the pairwise products of the weights on the 
corresponding edges of the "^wo graphs. The "gamma statistic is a 
direct generalization of the commonly used clustering indices and 
reduces to the number of repetitions whenever G represents a standard 
categorical decompositio'n of a stimulus list. The omega statistic, on 
the othser hand^ extracts more information from the protocol graph 
than doete gamma and incorporates a distance measured based on the 



number of 
(Author) 



intervening items in a subject':^ recall sequence. 
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INTRODUCT^N 

The general problem of defining indices of categorical clust6rihg in 
free recall has been the focus of extensive research in recent years (for 
instance, see Dalrymple-Alford, 1970; Frankel & Cole, 1971; Kelly, 1973; 
Roenker, Thompson, & Brown, 1971; and'shuell, 1969). Most of these contri- 
butions discuss alternative statistics that measure the degree to which a 
series of responses provided by a subject conforms to a hypothesized struc- 
ture within the set cJonsisting of all potential responses. Typically, a 
set of words or other stimuli that are assumed to be categorized into 
mutually exclusive and exhaustive classes -is given to a subject to study 
in a randomized order; subsequently, the subject is asked to recall as 
many items as possible from memory. An index of clustering quantifies the 
amount of correspondence between the subject's protocol and the specific 
partition of the items hypothesized by the researcher. If clustering in 
.recall occurs according to expectations, then the responses of a subject 
should be grouped more or less consistently with respect to the a priori / 
categories that theoretiecally partition the original stimulus list, and in 
particular, there should be a tendency for related items to Jpe recalled to- 
gether. ' ^ 

The intent of this paper is not to propose yet another clustering index 
as a competitor to the numerous ones already '.'on the market" (for illus- 
trations, refer to the papers cited earlier)! Instead, we wish to provide 
a novel framework wrthin which several of the more popular clustering in- 
dices may be viewed. In the first sections below, a graph-theoretic charac- 
terization of the clustering problem is developed; in the later sections 
certain specializations of the genera-1- framework are-discussed along with 
the appropriate statistical inference procedures. As one filrther comment, 
it should be pointed out^ that the material to follow is limited to the 
categorical clustering problem rather than to free re^a:|.l clustering in 
general (cf., Pellegrino's [1971] discussion of the Subjective-organization 
paradigm) . 
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A GRAPH-THEORETIC PARADIGM 



As a convention, suppose S denotes the set of n stimuli {o-^^ , . . . , ^ 
that contains the items presented to a subject. To forma:\.ize the under- 
lying structure of the stimulus set, it is convenient to define a grapfr G 
that has n nodes or points o^,, • • • with an edge or line between each un- 
ordered pair of distinct nodes. A nonnegative weight is attached to each 
edge, that for notational purposes will be referred to as q(o^,o.), where 
o. and o- are two distinct arbitrary nodes in S and define a single edge. 
The upper portion of Figure 1 illustrates the type of pictorial represen- 
tation that may be given for any graph G. In this excimple, n is Syand the 
arbitrary weights for all ten edge.s are ijetween 0.0 and 1.0, as might be 
represented by various numerical association norms. 

As a special case, a graph G may be used to represent any categori- 
zation assumed for the set S defined by a partition of S into object classes 
containing n-j^,...,nj^ elements, where n-j^+...+nj^ = n. Note that this case 
encompasses object classes and their associated elements defined either in 
a priori terms (experimenter-defined) or on the basis of subjects' idiosyn- 
cracies (subject-defined) , with the latter exemplified by subjects sorting 
objects into subject-perceived categories (cf., ^4andler, 1967). In' the 
present context, both types of categorization are considered to charac- 
terize the stimulus Structure graph G. For a pair of nodes within the same^ 
obje(^t class of the partition, the Weight function is defined to be 1.0; 
conversely, any edge between two nodes from separate objec^t classes is 
assigned a weight of 0.0. For exciraple, the lower portion of Figure 1 shows 
how the graph G would appear if five objects (n =*5) belonged to two classes 
(k = 2) with three objects, o^, and 03, in oije class' (n-^^ = 3) and two 

objects, o^ diV^ 05, in the other (n2 = 2). For convenience, this particular 
case will be called the standard interpretation , but clearly, a categoriza- 
tion defined, say,' by overlapping subsets or by a more complex structure 
could be characterized in a simil^ way. 

In a related manner the response sequence provided by a subject can be 
represented by a second graph R on the node Set {o^f m m . ,0 }. For the graph R 
the weight attached to an edge is either 1.0 or 0.0, where a 1.0 signifies 
that the two nodes were recalled sequentially with no intervening elements. 
Without loss of generality, it is assumed that all elements of S are actually 
recalled, since otherwise the^original set S could be redefined as those 
elements listed by a subject. Thus, the graph R consists of a single 



'We do not^ wish to contest here whether the proper basis for clustering is 
the unconditional or the CQj^nditional stimulus set (cf.. , Frender & Doubilet, 
1974). The procedures to be described can be applied in either case. 
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Figure 1. Illustration of a giraph G on five nodes with nonnegative weights 
J att;ached to all edges (upper portion uses a general Weight func- 

tion; lower portion is a standard interpretation) *• . 

\ ' . "' . ' • 

% 



V 



.■;-)ri t iquuUH st'qu<uic-»' ..t 1 i navinq wctiqhts of 1.0 that fiaiises tUrouqh ' •» 

each node (ince ana ^Jiily >n(.:»' . 

r)no [;ossibi(» mf'ar.urf' of ^ ;ur ruspc^nclencc between a subject's recall - ^ 

sequence and the hypoWKts i zed structure is given by the index 

ri n , 

I - L . 4(0^,0 j)C(o^,(;^) - .1 ^^q(o^,0. )C(o. ,0. ) , 

}-l 1-1 1- < j 

wl'ie^re C(o^,o ) is the zf^ro-one wfM ght function characterizing the graph R, 
^i(^^if'^j) pK'Vioir.ly defined for ''j, and q(Oj^,o^) ^ X:(o^, o^^) 0 for all 

1. In the standai'd uitt.^rj.retation , \ is meirely the number of repetn. t ions', 
i>^^.,'the number of no<iiti pairs that are recalled sequentia"*l ly and belong to 
Lh'^ '.ame ob'ie'St: c lass'" w i thi in the hypothesized partition/ Since the nuipber 
c f repetitions or sc;me transform of this quantity is^ the commonly used mea~ 
sur^"* of clusterioq discus^^ed in. tlie literature, the \ statistic is a natu- 
ral generalization. Specifically, a large index I" resu^tis when the -node 
pairs that are recdll^ni sequentially also have" the larger associated weights 
or. the defined t^dge'-> in '3. Although this discussion will emphasize the in- 
dex i' , an alternative mfvis.urf^ will be proposed in a later section that in- ^ 
corporates more information from a subject's protocol than simple adjacent 
responses. 

The constant miiitiplij^r of 1/2 used in the definition of I implies in 
Jin intuitive senrise that some type of correction is being jn^e for counting 
'the same products twice. In particular, if the original index T were stated 
without the constant<^ multipl ler and the weight functions were not assumed to 
be symmetric, then a similar index may be defined between two possijDly asym- ■ 
metric weight functions Thr,- graph G would be characterized by the presence 
of tVi^o edges between each pair of nodes Oj^ and o j , where cme- edge is directed 
from o^ to Oj and weighted by q(Oj^,Oj) and the second edge is placed in 
an opposite orientation and Weighted by q(Oj,Oj^), i.e., directed fiom Oj to 
o . In. a similar way tho protocol graph R could be directed; for instance, 
<^ach edge that has a weight of 1.0 is matched with an edge between the same 
two nodes but with a weight of 0.0 and is directed in the opposite way. 
';eneral directed gra[)hs of this type could prove- a useful extension if the 
ordt.'t in which the ';iil>pjc;t jjrovides the recalled nodes is of interest (e.g., 

Pc^ilegrino, i'v)71), hut foi our purposes only symmetric wei^ght functions 
will b*^ considered exjdicitly. 

At this ^x:iiht there ar»' two distinct problems that could be attacked: 
(a) ^iionnal izing the indf^x i to provide a measure of clustering, or (b) de~ 
fTr.iriq ci hypothes is- te.. t i ng proce<iure for evaluating the size of an observed O 
^\ In some instances the second problem subsumes the first, since 
many r;f the more acceptable normalizations require an initial calculation 
(;f" statistics that are also lu^eded in hypothesis testing. Nevertheless, a 
numl^er of possible ,normaiizations will be presented later that relate directly 
to several of ^the more popular indices already used for the standard mter- 
[•re tat ion . 
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A PERMUTATION DISTRIBUTION FOR r ' 

One possible strategy for assessing the * correspondence between the two 
graphs R and G is to develop a statistical^ase\ine through a randomization 
or permutation distribution for the index T ( for. example , se« Barton & 
David, 1966) ^ Under the assumption that there is no inherent' relationship 
-between a subject's response protocol, and the underlying assumed ^structure 
defined by G/ each possible permutation of the nodes 0]^,...,0y^ is assumed to 
have an equally likely chance of occurring a priori as the subject's respons 
sequence. Since there are nl possible orderings of the n nodes, an index T 
couid be calculated for each. such sequence, generating what is typically 
called a permutation distribution for P. 'By comparing the observed value of 
r to this distribution, a pre^cise evaluation may be made as to whether the 
observed value of T is large enough to reject the hypothesis that the sub- 
ject's protocol has no inherent relationship "to the researcher's theoretical 
categorization. In other words, witii respect to the graph G the following 
question is raised: Is^it reasonable to infer that the subject's protocol 
was not chosen at, random from the nl possible response sequences? 

Clearly there are many difficulties with' this formulation, since even, 
in the eveht that a subject is responding independently of the assumed cate- 
"■gorization, * it is very unlikely that the protocol chosen can be viewed 
realistically as an a'ctual randoip selection from all nl possible response 
" sequences formed from the list of recalled nodes (for example, see Shuell, 
1969). Nevertheless, an inference tech'nlque based upon complete randomi- 
zation is justified to the extent that response biases, such as serial posi- 
tion effects, are unrelated to the categorization being tested by the re- 
searcher.^ There does not appear to be any simple way of making this ob- 
viously vague generalization any more precise that woyld, at the same time, 
allow the development of a very general inference procedure. 

As a very elementary excimple that should provide som6 clarification, 
suppose that a subject recalls four words in the order o-^^ , 02 f O3 f • The 
researcher has assuqaed that a standard interpretation holds in which the 
nodes {0-^^,02} form one category and {03,04} form a second. In this illus- 
tration, tw.o ed^es are present in G with weights of 1.0 between and O2 
*and between 03 and 04 ? alternatively, in R, three edges are present with 
weights of 1-0 defined between each pair of adjacent responses: o-^ ^d O2/ 
02 and 03, and 03 and 04 . All other 'edges in both graphs have weights of 
0.0. Consequently, the observed value of T is 2.0, and the appropriate 



Response biases of this kind (that act^ to disturb the nominal proj|?ability 
levels under the assumption of "equally likely" sequences) piay be counter- 
acted to some extent by the investigator, through such techniques as block 
randomijzation of items representing different categories, the inclusion of 
"bufferV' items in the first and last few study list positions, and the in- 
ser^iOnVof an interpolated-acti vity interval between study and test. More 
complex decision rules could also be devised, .such as ignoring those items 
in the subject' 5 protocol that occur in exactly the same k initial or termi 
nal serial positions as on the study list- 



permutation distribution is defined by calculating f for all 41 = 24 
possi^^le response protocols, where each such protocol includes all foilr 
of th^ nodes (see Table 1)."^ 



TABLE 1 

A SAMPLE PERMUTATION orSTRlBUTION FOR T 







r 




























Permutation 








r Value 


1-2: 




°i 


°2 


°3 


°4'- 


°4 


°3 


°2 


°1 




2 


3-4: 




°i 


°2 


°4 


°3'- 


°3 


°4 


°2 


°1 




2 


5-6: 




°i 


°'3 


°2 


°4'- 


°4 


°2 


°3 


°1 ' 




0 


7-8: 




°i 


°3 


°4 


°2' 


°2 


°4 


°3 


°1 




1 


9-10: 




°i 


■°4 


°2 


S'- 


°3 


°2 


°4 


°1 




0 


11-12: 




°i 


°4 


°3 


°2'- 


°2 


°3 


°4 


°1 




- 1 • 


13-14:^ " 




°2 


°1 


°3 


°4'- • 


°4 


°3 


°1 


°2 




2 


15-16: ' ^ 




°2 


°1 


°4 


°3'- 


°3 


°4 


°1 


°2 ^ 




2 


17-18: 




°2 


°3 


°1 




°4 


°1 


°3 




' 0 


19-20: 




°2 


°4 


°1 


°3'- 


°3 


°1 


°4 


°2 




0 


2I1-22 : 




°3 


°1 


°2 


°4'- 


°4 


°2 


°1 


°3 




. 1 


2 3-24: 




'°3 


°2 


°1 


°4'- 


°4 


°1 




°3 




1 ' 



The proljcUDility distribution based on these ^obtained values of F is as 
follows: - * 



r Probability 
V ^ ^ 0 \ 8/24 

^ ' 1 ^ ' 8/24 

1 Vi 

2 ' 8/24 

Within a hypothesis-testing context, the probability of observing a value 
of r equal to 2 (or larger) is 1/3 uhder the assumption that the respons 
protocol is chosen at random r^^Mi the 41 possible sequences. ,A larger 



value of n would be necessary to provide attainable significance levels 
in the traditional ranges of .05 to ,01, but obviously, ''the same paradigm 
could be used with a corresponding increase in the required computational 
labor w • ' 

The procedure just described constructs what is called a ' "conditional 
permutation o^-^^ibut ion" in the statistical literature, where the term 
"conditicjnkl" refers to the- u^e of the sxjbject's actual protocol in iden- 
ti f ^ng sTabset; of nodes for the construction o£* the reference distri- 
bution. 'Inference procedures based upon these ideas form the basis for 
^ much of nonparametric statistics, and in fact, some of the same problems 
that appear in applying nonparametric techniques also cause dijf f iculties 
in the free; recall framework as well. Specifically, since the permutation 
distribution must be generated anew for each particular application, alter- 
native approaches that bypass complete enumeration must be found. Gener- 
ally, two different solutions are attempted in the statistics literature: 
the substitution, of "secures" (for instance, ranks or normal deviates) for 
the original numerical observations that will allow a tabling of t±e per- 
mutation distribution that suffices for all applications; or secondly, 
deriving the mean <and^varianx:e formulas for the appropriate test statistic 
,afid relying on large sample distributions for hypothesis testing. 

Unfortunately, because- of the great variability in the types* of 'Cate- 
gorization structure, the latter alternative is the only possibility that 
can be entertained for the free recall problem. Consequently, the next 
task is 'to derive the mean and variance parameters for P. For' an attempt 
to obtain complete probability distributions Xn the case^of a standard 
interpretat^ion , the reader should consult Kelly (1973). . . 
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IV 

MEAN AND V ARIANCE FOR r - 

/ ■ 

The mean and variance j^arameters for P are easily derived and, sur 
prisingly, are a special case of <da much more general set of expressions 
given by Mantel (1967) in the biometrics literature. For convenience, 
suppose A , A , and A are defiiied as follows: 



n n 2 
A = { E Z q(o. ,o . ) ) ; 
1=1 3=1 



" " 2 

A = Z ,( Z q(o ,0 )) ; 

' i=l j=l ^ J ^ < 



, . n n 2 
' A = Z - Z q (o . ,o , ) . 

i=l j=l ^ i ^ 



"1 



Then, using this notation,, 

^ ' n n 

E(r) = (1/n) Z Z q(o. ,o.) ; 

^ i.=l j=l - ^ ^ 

Var(r) = (l/(n(n-l) ) ) (A - 2A ) - (l/n^)A + Cl/n)A 

For the standard interpretation in which the assumed partition con- 
sists-^f object classes of sizes n^, . . . formulas [1] and [2] reduce 
considerably to the forms given in [3] and [4], respectively: 
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• 2 2 2 

Var(r) = (l/(n (n-1) ) ) ( Z n . ) 

i=l ^ 



[4] 



- (2/ (^(n- !.))).( Z nT) 
i=l 



-f ((n+l)/(n(n-l))) ( Z n^) ^. 

i=l 



n/(n-l). 



In' this- special case, [3] is the expected number of repetitions and 
is identical to the expression derived by Bousfield and Bousfield (1966)'. 
Furthennore, the variance term in [4] is^ equivalent to a formula used by 
^ Frankel and Cole (1971) and is equal to the variance of the number of runs 
in a multiple- type object context since the number of such jruns is merely 
the complement of the number of repetitions. For the probability distri- 
bution given previously anid using formulas 13] and [4], we find that E(r) = 
1 -and Var(r) = 2/3. These values can be verified numerically by computing 
the mean and variance of F directly from -the comple^e^ permutation distri- 
bution. 

Because the mean and variance parcimeters f or JT are available, it is 
natural to normalize the index F in the following way : 



Z = (F-E(F) )//Var (F) . 



is jiormalization "corrects" the observed value of F for the aimount of 
clustering expected for the particular items recalled by the subject. 
Fallowing Frankel and Cole (1971) , -the statistic Z generalizes the type of 
deviation measure that Shuell (1969) suggests for an index of clustering 
in a standard interpretation/ Several other indices are suggested later. 
Finally, it should be noted that it seems reasonable tOt conpare this Z index 
to a standard. normal distribution (given relatively large n) in order to pro- 
vide an approximation to the permutation distribution discussed earlier. 



Although in this section y/e discuss normalization procedures for a single 
subject under a conditional permutation model, a more usexul extension cein 
be developed through an appropriate measure C(o.rO.) based on N protocols. 
This is presented in a later section. ^ ^ 
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AN ALTERNATIVE INDEX Q AND SOME EXTENSIONS 



The coxnron measures of clustering used in the free recall literature, 
including the general measure T , depend only upon a minimal amount of in- 
formation from a<i individual subject's protocol. Specifically, only those 
node pairs that are recalled seqxientially contribute to the ^leasure and 
all other pairs contribute nothing, even those that are separated by only 
one intervening^node within the recall sequence. There is bhe rather 
simple scheme, however, for incorporating additional information from the 
subject's protocol by defining an alternative index f^. Suppose the sub- 
ject generates the node sequence oi,...,o^ and a proximity function is 
defined between any two nodes in the protocol as the number of intervening 
nodes plus one. Thus, two nodes that are recalled far a^art should have 
a large associated proximity function. In particular, define ^(0^,0^) = 
Ir - s| and let 



n n . 

^ = (1/2) Z Z q(o. ,o.)C(o. ,0,) 

i=l j=l ' ^ ' ' 

n n 

= (1/2) I Z*q(o.,o.)|i - j| = Z Zq(o^,o ) (j-i) . 

1=1 j=l • ^ i < j 



If clustering in recall occurs, then tv/o items Oj^ and Oj within the 
same category, or more generally, two items with relatively large values 
of q (0^,0 Or should have small .associated function values C(oj^,Oj). Con- 
sequently'i the smaller the value, of fi, the more clustering in free recall 
occurs according to what is expected considering the weighted graph G. 
Fortunately, the mean and variance parameters for ^ are also available as 
special cases ,of the Mantel (1967) formulas: 



n n 

E{Q) = [(n+l)/6] E T. q(o;,o.); 

i=l j=l ^ ^ 



Var(Q) = [(n+l)/180] [-A^+(n-4)A2+4(n-l)A^] 
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For the standard int^erBMret^tion , these two expressions take on the simpler 
forms given in^ [7] and [8] : . . ' 



2 

E(fi) = ( (n+l)/6) ( Z n. - n) ; 

" -2 2 

Var(n) = ( (n+l)/180) [- ( Z n.) + 4(n+l)/ 

/i=l ^ 

n 



)/ ? n' 



.+ (n-4) Z — 4n^] . 
i=l ^ 



[7] • 
[8] 



As a simple numerical illustration in the case of a standard inter- 
pretation,' the four-node example given previously may also be used to verify 
formulas [7] and [8]. . In^ this case<, the complete set of permutation values 
would be as shovm in Table 2 . ' 

^ 'TABLE" 2 ' . 

A SAMPLE PERMUTATION DISTRIBUTION FOR Q 



^ . 


Permutation 




Q Value 



1-2: 




°3 


°4'- 


°4 


°3 


°2 


°1 


2 


3-4: 


°1 °2 


°4 


°3'- 


°3 


°4 


°2 


°1 


2 


5-6: 


°l-°3 


°2 


°4' 


°4 


°2 


°3 


°1 


4 


7-8: 


o o 
1 3 


°4 


°2'- 


°2 


°4 


> 


°1 
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The corresponaing probability distribution would be; 




2 
4 



Probability 

8/24 
16/24 



Thus, commuting' either from formulas [7] and [8] o;: from the actual permuta- 
tion distribution, we find E[n) = 10/3 ^nd VarW) =*8/9. A normalization ' of 
. t^e index Q using the meam* and variance ipay be usefi^* for interpretation here 
as well, / * 



■ ER?C 



1£ 



INDICES OF CLUSTERING 



Although hypothesis testing can be approached through an application- 
of ..a randomization distribution, a second rather distinct problem still 
remains in defining "good" indices of categorical clustering. Exactly 
the same difficulty occurs in measuring rank correlation using the number 
of r^nk order inversions as a criterion. Almost all r>f the suggested 
rank correlation measures rely ofi the same statistic (usually denoted by 
S) to test the null hypothesis of no population association (see Hays; 
1973, p. 799). Nevertheless, at least five 'different normalizations of 
this basic statistic have been suggested as a way of providing a final 
measurrf of rank correlation, e.g., Somers' asymmetrical y's, Goodman- 
Kruskal's y, tau , tau^, and tau^ (Somers, 1962). Consequently, the 
basic statistic ror the standard interpretation r^ee recall problem 
defined by the number of repetitions seems to be th^natural cinalogue of 
the S^ statistic of rank correlation; moreover, the desire to find an ade- 
quate index of clustering corresponds directly to the historical search 
for a good ind^x of rank correlation. ^ 

In our general framework, the indices T and Q, play the role of basic 
statistics that could be normalized in various ways to provide a final 
index of clustering. Several normalizations are suggested in Table 3 that 
will reduce for the special case of a standard interpretation to the more 
familiar measures discussed in the psychological literature. No attempt 
will be made to evaluate the merits of each of these normalizations, cind 
thus, the reader is urged to consult the sources that are cited for exten- 
sive critiques and theoretical justifications. ^ 

Each of the indices given in Table 3 depends upon a number of constants 
chosen from the following list: 

E(r), EW , Var(r), Var W , Max{r) , Max(^]), Min(r), Max(f^). 

All of these quantities have been defined earlier except for the Min and 
Max parameters, and these latter bounds can be obtained by a simple order-' 
ing operation. In particular, if the n(n-l)/2 values of q[o^,o^) are 
ordered from smallest to largest and the n(n-l)-/2 values of C(oi,Oj) are 
also ordered from smallest to largest, then one-half of the sum of the pair- 
wise products of the two entries in the same rank position defines the maxi- 
muill value of phe index. Similarly, if the n(n-l)/2 values of C(oi,Oj) are 
reordered oppositely from largest to smallest, then one-half of the sum of 
the pairwise products define^^^fe minimum index val.ue (Gilmore, 1962). If 
a fairly simple structure for tl^ graph G can be identified (e.g., a standard 
interpretation) then a closed- form expression for the minimum and maximum 
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index vallics may be obtained directly. In general, however, any application 
that depends on a rather complex structure 'm the gcafch G will require a sepa- 
rate evaluation of the minimum ajid maximum index valyfes through this type of 
ordering procedure. 
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VII 

GROUP STATISTICS 



The indices of clustering in free recall that have been discussed up 
t£r th^s* point are limited to the protocols ^qf a Single subject. However, 
thereSiB an immediate generalization of the basic randomization paradigm 
that provides a direct extension to group data, or for that matter, to 
repeated trialsfusing the same subject. For instance, suppose the stimulus 
structure graphic is fixed but we obtain N protocols either from, a group of 
N subjects or fpom the same subject over N trials- Each of the N protocols 
is defined by a subset of the set of nodes S that define the graph and 
a proximity measure is constructed in some way between each pair o#-TR5Hes 
Oj^ and Oj in S. As an illustration, an overall proximity function C(oi,oj) 
could be obtained by first constructing for each protocol k a proximity 
matrix Cj^(oi,'oj) between all node pairs in S and then summing (amd possibly 
averaging) the N individual proximity functions. For a specific example, 
the proximity function C)^(oi,Oj) for protocols k could be defined as 

r . ' - 



C, (o. ,o.) = ( ' 
k 1 3 ^ 



0 



if o. and o. are recalled consecutively 
in protocol k; . - " 

otherwise - 

/ 



I 



In this case, if C(o^,Oj) = ICj^(oi,Oj), then the overall proximity between 



Oj^ and Oj is the number of protocols in vhich o^ 



and Oj were recalled sequen- 
tially. "Thus, with this interpretation, larger values of C(o^,.oj) correspond 
to the more similar A)bjects, As an alternative possibility, suppose that 
protocol k contains nj^ recalled 'items and we define: 



r 



C,^(o. ,o.) = / 



+ 1 



if both o. and o. are recalled in protocol 
I'D 



k and with | i - j 



if either o . or o . 

1 3 



T 1 intervening nodes; 

is not present in protocol k. 



Using this definition and summing over all protocols, small values of C(oi,Oj) 
denote the more similar object pairs. » 
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In ^ny event, qiven the final* proximity measure C(Oj^,Oj), a general 
index may be defined by, say. A: 1 - 



' n n 

(1/2) Z , I q(o". ,Q.)C(o. ,o.) 

j=l 1=1. ^ ' ^ 

\ 



~ I Zq(o. ,0.)C(0. ,0.) 

1 < J ^ 



Mantel's formulas immediately provide the- randomization mean and variance 
for A : 



n ' n 



Let A.= (L Z q (o. ,0 . ) ) ; 
3 = 1 1=1 



n n 



A = Z (. Z q (o , ,0 . ) ) ; 
j=l i=l ^ ^ 



n n 



= Z Z q (o . ,0 . ) ; 

3 3=1 i=i ' 3 



" B = ( Z Z C(o. ,0 . ) ) ; 
' 3=1 1=1 ^ ^ ' . 



= Z ( Z C(0. ,0.)) ; 
, j=l 1=1 ' 3 



. B^ = Z Z C(o. ,0 . ) ; 
3 3 = 1-1=1 ' 3 



Then 



E(A) = [l/(2n(n-l) ) ]/A^B^ ; 



Var.(A^) = - [r/(2n(n-l) ) ] A^B^ 



+ [l/(2n(n-l) ) ]A^B^ 
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^ + [l/(n(n-l) (n-2) ) ] [A_ - A 1 [B„ - B.] 

+ [l/(4(n(n-l) (n-2) (n-3)) ) [A^ - 4A + 2A ] 
• ■ 12 3. 

with these parameters, a normalized Z statistic m^y be defined *in the 
usual way: 

Z = (A - E(A) )//Var (A) 



Once again,' this Z statistic should provide a convenient large-sample 
appro xima;b>j. on t:o the exact permutation test that the measures q(oi,Oj) and 
C(oj^,Oj) are unrelated, or more simply, Z could be used aa a normalized 
group measure of clustering in free recall. / 

Although the general statistic A ,may be used to index clustering in 
free recall for a group of subjects or for a single subject over 'trials, a 
more traditional approach to group analyses should be noted. Here the 
single protocol statistics, say P or fi, are calculated and used in tradi- 
tional analysis of variance paradigms to assess group differences, trends, 
and so on. Clearly, the use of a clustering index as a dependent variable 
is a much more general technique than the simple randomi2!:ation 'e'x:tension 
defined through the single index A, 
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VIII 



DISCUSSION 



Although the inference problem discussei^ in this paper has been 
framed completely .wit>iin the free recall paradigm, in actuality the task 
of comparing two graphs can be made much more general. We have indicated 
earlier that in the free recall paradigm, the subject response graph, R, 
IS compared with the stimulus structure graph, G, with the latter defined 
either by the experimenter or by the subject. In some cases, however, 
the stimulus structure graph may be of interest in its own right, namely, 
when an investigator wishes to compare some a priori structure with the 
subjetrt's perception of it (see, for example, Anglin, 1970). Suppose 
the subject is asked to sort the elements of S into groups of similar 
objects, as is done in the Mandler (1967) paradigm. An index of corre- 
spondence between the subject's sort and the a priori structure characterized 
by G can be obtained in the same way that T or were defined earlier. 

In summary,' the problem of comparing two graphs R cind G appears to 
be a very general inference technique that can be identified as basic to 
many experimental situations in the behavioral sciences. Given the elegance 
of the associated randomization procedures, this framework is capable of ^ 
providing an extremely general inference strategy. The necessary corre- 
spondences are now being developed in detail by the authors, and hopefully, 
this work will provide the applied researcher with a new set of useful and 
powerful analytical tools. 
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